Scalable Fault Tolerance

نویسنده

  • Shay Kutten
چکیده

Abst rac t . As communication networks grow, existing fault handling tools become increasingly unaffordable. In many cases the reason is that they involve global measures such as global time-outs or reset procedures, and their cost grows with the size of the network. Rather, for a fault handling mechanism to scale to large networks, it should involve local measures, or, at worse, fault local measures, i.e. measures the cost of which depends only on the number of failed nodes (which, thanks to today's technology, grows much slower than the networks). This decreases the recovery time and, moreover, often allows the non-faulty regions of the networks to continue their operation even during the recovery of the faulty parts. We describe several research ideas that lead in this direction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Scalable Byzantine Fault Tolerant Service in Grid System

This paper describes the design, implementation and usage of a secure scalable Byzantine fault tolerant MDS system in the Grid. The scalable Byzantine fault tolerant MDS system provides a hierarchy GIIS servers, a local GIIS domain can require the resource it needs from remote GIIS domain. By using the statemachine replication approach and quorum system technique, the scalable Byzantine fault t...

متن کامل

JSEB (Java Scalable sErvices Builder): Scalable Systems for Clusters of Workstations

We present a report on JSEB (Java Scalable Service Builder) whose goal is to offer programmers a tool that can be used to efficiently add scalability and fault-tolerance to a replicated service in cluster(s) of workstations.

متن کامل

Achieving On-chip Fault-tolerance Utilizing BIST Resources

Widespread reliability challenges are expected for 65nm and below VLSI fabrication technologies. Effective and efficient on-chip fault-tolerance solutions are required to counter reliability challenges. A new postfabrication reconfigurable and scalable approach of achieving on-chip fault-tolerance, using built-in-self-test (BIST) resources, has been proposed. This paper describes the approach a...

متن کامل

Run-Through Stabilization: An MPI Proposal for Process Fault Tolerance

The MPI standard lacks semantics and interfaces for sustained application execution in the presence of process failures. Exascale HPC systems may require scalable, fault resilient MPI applications. The mission of the MPI Forum’s Fault Tolerance Working Group is to enhance the standard to enable the development of scalable, fault tolerant HPC applications. This paper presents an overview of the ...

متن کامل

Scalable and Secure Data Collection: Fault Tolerance Considerations

Data collection, or uploading, is an inherent part of numerous digital government applications. In this poster we present our recent research directions in the development of Bistro, a scalable and secure architecture designed for collection of data over the Internet for digital government applications.

متن کامل

Large-Scale Computation Not at the Cost of Expressiveness

We present Celias, a new concurrent programming model for data-intensive scalable computing. Celias supports many virtues commonly found in existing distributed programming frameworks, such as elastic scaling and fault tolerance, without sacrificing expressiveness. The key design idea of Celias is the concept of a microtask, as a scalable, fault-tolerant, and completely data-driven unit of comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996